UWaterloo at NTCIR-9: Intent discovery with anchor text
نویسندگان
چکیده
This paper describes our submission to the Intent Discovery task at the NTCIR-9. By treating the source and target documents of anchor texts as nodes, we utilized the anchor texts between the nodes as edges in a documents–anchors graph representation of the corpus. We extracted and indexed anchor links information from the provided SogouT corpus. Using the queries, anchor texts are retrieved from the index. Other anchor texts that link to the target documents of retrieved anchor texts are also retrieved. All the anchor texts are ranked and grouped to eliminate duplicates and near duplicates.
منابع مشابه
UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery
This paper describes UKP’s participation in the cross-lingual link discovery (CLLD) task at NTCIR-9. The given task is to find valid anchor texts from a new English Wikipedia page and retrieve the corresponding target Wiki pages in Chinese, Japanese, and Korean languages. We have developed a CLLD framework consisting of anchor selection, anchor ranking, anchor translation, and target discovery ...
متن کاملHUKB at NTCIR-12 IMine-2 Task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining
Query understandings is a task to identify the important subtopics of a given query with vertical intent. In this task, characteristic keywords extracted from query analysis results and Wikipedia are used as candidates for the subtopics. From these candidates, topic-model based on the web documents retrieved by an original query is used for selecting appropriate subtopics, Vertical intent is ju...
متن کاملIISR Crosslink Approach at NTCIR 9 CLLD Task
In this paper, we describe our approach to the English-Korean Cross-Lingual Link Discovery (CLLD) task in NTCIR 9. We propose a simple and effective approach to discover the links. Our method comprises preprocessing steps, anchor-target link mapping, and the ranking steps. For discovering the links, we use the English anchor names, the inter-language links, and the translation by the Google Tra...
متن کاملHITS' Graph-based System at the NTCIR-9 Cross-lingual Link Discovery Task
This paper presents HITS’ system for the NTCIR-9 crosslingual link discovery task. We solve the task in three stages: (1) anchor identification and ambiguity reduction, (2) graphbased disambiguation combining different relatedness measures as edge weights for a maximum edge weighted clique algorithm, and (3) supervised relevance ranking. In the fileto-file evaluation with Wikipedia ground-truth...
متن کاملOverview of the NTCIR-10 Cross-Lingual Link Discovery Task
This paper presents an overview of NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) task. For the task, we continued using the evaluation framework developed for the NTCIR-9 CrossLink-1 task. Overall, recommended links were evaluated at two levels (file-to-file and anchor-to-file); and system performance was evaluated with metrics: LMAP, R-Prec and P@N.
متن کامل